Understanding the genetic architecture of gene expression

Heather E. Wheeler

February 13, 2015

PrediXcan Step 1: Build and Test Predictors

PrediXcan Step 2: Build database of Best Predictors

PrediXcan Step 3: Impute gene expression and test for association with phenotype

Explore the Genetic Architecture of Transcriptome Regulation

Optimizing predictors for PrediXcan also tells us about the underlying genetic architecture of gene expression.

We can ask what proportion of genes have:

Primary cohort: DGN

cis vs. trans effects

Estimate the heritability of gene expression in a joint analysis: localGRM (SNPs w/in 1Mb) + globalGRM (all SNPs)

Local (joint) sorted h2 estimates with 95% CI from GCTA

https://github.com/hwheeler01/cross-tissue/blob/master/analysis/sources/heritab_analysis.html

Global (joint) sorted h2 estimates with 95% CI from GCTA

https://github.com/hwheeler01/cross-tissue/blob/master/analysis/sources/heritab_analysis.html

100 permutations to determine expected distribution of h2 estimates

100 permutations to determine expected distribution of h2 estimates

Sort the h2 from each permutation

Sort the h2 from each permutation

Sort the h2 from each permutation

cis vs. trans effects

Try a larger sample to better caputure trans effects

Framingham Heart Study

sparse vs. polygenic effects

glmnet solves the following problem \[ \min_{\beta_0,\beta} \frac{1}{N} \sum_{i=1}^{N} w_i l(y_i,\beta_0+\beta^T x_i) + \lambda\left[(1-\alpha)||\beta||_2^2/2 + \alpha ||\beta||_1\right], \] over a grid of values of \(\lambda\) covering the entire range.

The elastic-net penalty is controlled by \(\alpha\), and bridges the gap between lasso (\(\alpha=1\), the default) and ridge (\(\alpha=0\)). The tuning parameter \(\lambda\) controls the overall strength of the penalty.

http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

sparse vs. polygenic effects

For each gene, determine \(\alpha\) with best 10-fold CV predictive performance using cis SNPs.

Predictive performance consistent across most alphas

Predictive performance consistent between \(\alpha\)=0.5 and \(\alpha\)=1

Also tested Polyscore predictive performance using 10-fold CV

\(expression = \sum\hat{w}*gt\)

Single variant linear regression coefficients (\(w\)) at several P-value thresholds included in the additive model:

Polyscore (cis SNPs only) predictive performance

Polyscore (cis SNPs only) predictive performance

LASSO predicts gene expression better than Polyscore

For robustness, consider EN (alpha=0.5) for PrediXcan

cross-tissue vs. tissue-specific effects with GTEx

Modeling cross-tissue expression

Linear mixed effect model

library(lme4)

fit <- lmer(expression ~ (1|SUBJID) + TISSUE 
+ GENDER + PEERs) 

#cross-tissue expression
fitranef <- ranef(fit) 

#tissue-specific expression
fitresid <- resid(fit) 

Estimating heritability with GCTA

Tested two genetic relationship matrix (GRM) models for each expressed gene

First pass: estimated h2 of cross-tissue expression and tissue-specific expression in the 7 tissues with the most samples

GCTA heritability: Y ~ localGRM h2

alt text

GCTA heritability: Y ~ localGRM h2 ZOOM

alt text

GCTA heritability: Y ~ localGRM p-values

alt text

GCTA heritability: Y ~ localGRM + globalGRM h2

alt text

GCTA heritability: Y ~ localGRM + globalGRM h2

alt text

GCTA heritability: Y ~ localGRM + globalGRM SE

alt text